Code
#loading packages
library(DiagrammeR)Missing data occurs when there are missing values in a dataset. There are many reasons why this occurs. It can be intentional or unintentional and can be classified into the following three categories, otherwise known as missingness mechanisms (Mainzer et al. 2023):
Missing completely at random (MCAR) is the probability of missing data being completely independent of any other variables.
Missing at random (MAR) is the probability of missing data being related to the observed values.
Missing not at random (MNAR) is the probability of missing data being dependent on the missing and observed values.
Figure 1: Graphical Representation of Missingness Mechanisms (Schafer and Graham 2002)
(X are the completely observed variables. Y are the partly missing variables. Z is the component of the cause of missingness unrelated to X and Y. R is the missingness.)
Looking for patterns in the missing data can help us to determine which category they belong. These mechanisms are important in determining how to handle the missing data. MCAR would be the best case scenario but seldom occur. MAR and MNAR are more common.
The problem with ignoring any missing values is that it does not give a true representation of the dataset and can lead to bias when analyzing. This reduces the statistical power of the analysis (van_Ginkel et al. 2020). To enhance the quality of the research, the following should be followed: explicitly acknowledge missing data problems and the conditions under which they occur and employ principled methods to handle the missing data (Dong and Peng 2013).
There are three types of methods to deal with missing data, the likelihood and Bayesian method, weighting methods, or imputation methods (Cao et al. 2021). Missing data can also be handled by simply deleting.
Likelihood Bayesian method is when information from a previous predictive distribution is combined with evidence obtained in a sample to predict a value. It requires technical coding and advanced statistical knowledge.
The weighting method is a traditional approach when weights from available data are used to adjust for non-response in a survey. Inefficiency occurs when there are extreme weights or a need for many weights.
The imputation method is when an estimate from the original dataset is used to estimate the missing value. There are two types of imputation: single and multiple.
Listwise deletion is when the entire observation is removed from the dataset. Deleting missing data can lead to the loss of important information regarding your dataset and is therefore not recommended. In certain cases, when the amount of missing data is small and the type is MCAR, listwise deletion can be used. There usually won’t be bias but potentially important information may be lost.
T-tests and chi-square tests can be used to assess pairs of predictor variables to determine whether the groups’ means differ significantly. According to (van_Ginkel et al. 2020), if significant, the null hypothesis is rejected, therefore, indicating that the missing values are not randomly scattered throughout the data. This implies that the missing data is MAR or MNAR. Conversely, if nonsignificant, this implies that the data cannot be MAR. This does not eliminate the possibility that it is not MNAR–other information about the population is needed to determine this.
Whenever missing data is categorized as MAR or MNAR, listwise deletion would be wasteful, and the analysis biased. Alternate methods of dealing with the missing data is recommended: either pairwise deletion or imputation.
Pairwise deletion is when only the missing variable of an observation is removed. It allows more data to be analyzed than listwise deletion but limits the ability to make inferences of the total sample. For this reason, it is recommended to use imputation to properly deal with missing data.
Imputation is the preferred method to handle missing data. It consists of replacing missing data with an estimate obtained from the original, available data. After imputation, there will be a full dataset to analyze. To improve statistical power, the number of imputations created should be at least equal to the percent of missing data (5% equals 5 imputations, 10% equals 10 imputations, 20% equals 20 imputations, etc.) (Pedersen et al. 2017). According to (Wulff and Jeppesen 2017), 3-5 imputations are sufficient, and 10 are more than enough.
Single, or univariate, imputation is when only one estimate is used to replace the missing data. Methods of single imputation include using the mean, the last observation carried forward, and random imputation. The following is a brief explanation of each:
Using the mean to replace a missing value is a straight-forward process. The mean of the dataset is calculated, including the missing value. The mean is then multiplied by the number of observations in the study. Next, the known values are subtracted from the product, and this gives an estimate that can be used for any missing values. The problem with this method is that it reduces the variance which leads to a smaller confidence interval.
Last Observation Carried Forward (LOCF) is a technique of replacing a missing value in longitudinal studies with a previously observed value (the most recent value is carried forward) (Streiner 2008). The problem with this method is that it assumes that the previous observed value is perpetual when in reality that most likely is not the case.
Random imputation is a method of randomly drawing an observation and using that observation for any of the missing values. The problem with this method is that it introduces additional variability.
These single imputation methods are flawed. They often result in underestimation of standard errors or too small p-values (Dong and Peng 2013), which can cause bias in the analysis. Therefore, multiple imputation is the better method because it handles missing data better and provides less biased results.
Multiple, or multivariate, imputation is when various estimates are used to replace the missing data by creating multiple datasets from versions of the original dataset. It can be done by using a regression model, or a sequence of regression models, such as linear, logistic and Poison. A set of m plausible values are generated for each unobserved data point, resulting in M complete data sets (Dong and Peng 2013). The new values are randomly drawn from predictive distributions either through joint modeling (JM, which is not used much anymore) or fully conditional specification (FCS) (Wongkamthong and Akande 2023). It is then analyzed and the results are combined to obtain a single value for the missing data.
The purpose of multiple imputation is to create a pool of imputed data for analysis, but if the pooled results are lacking, then multiple imputation should not be done (Mainzer et al. 2023). Another reason not to use multiple imputation is if there are very few missing values; there may be no benefit in using it. Also worth noting is some statistical analyses software already have built-in features to deal with missing data.
Multiple imputation by chained methods, otherwise known as MICE, is the most common and preferred, method of multiple imputation (Wulff and Jeppesen 2017). It provides a more reliable way to analyze data with missing values. For this reason, this paper will focus on the methodology and application of the MICE process.
#loading packages
library(DiagrammeR)Figure 2: Flowchart of the MICE-process based on procedures proposed by Rubin (Wulff and Jeppesen 2017)
DiagrammeR::grViz("digraph {
# initiate graph
graph [layout = dot, rankdir = LR, label = 'The MICE-Process\n\n',labelloc = t, fontcolor = DarkSlateBlue, fontsize = 45]
# global node settings
node [shape = rectangle, style = filled, fillcolor = AliceBlue, fontcolor = DarkSlateBlue, fontsize = 35]
bgcolor = none
# label nodes
incomplete [label = 'Incomplete data set']
imputed1 [label = 'Imputed \n data set 1']
estimates1 [label = 'Estimates from \n analysis 1']
rubin [label = 'Rubin rules', shape = diamond]
combined [label = 'Combined results']
imputed2 [label = 'Imputed \n data set 2']
estimates2 [label = 'Estimates from \n analysis 2']
imputedm [label = 'Imputed \n data set m']
estimatesm [label = 'Estimates from \n anaalysis m']
# edge definitions with the node IDs
incomplete -> imputed1 [arrowhead = vee, color = DarkSlateBlue]
imputed1 -> estimates1 [arrowhead = vee, color = DarkSlateBlue]
estimates1 -> rubin [arrowhead = vee, color = DarkSlateBlue]
incomplete -> imputed2 [arrowhead = vee, color = DarkSlateBlue]
imputed2 -> estimates2 [arrowhead = vee, color = DarkSlateBlue]
estimates2-> rubin [arrowhead = vee, color = DarkSlateBlue]
incomplete -> imputedm [arrowhead = vee, color = DarkSlateBlue]
imputedm -> estimatesm [arrowhead = vee, color = DarkSlateBlue]
estimatesm -> rubin [arrowhead = vee, color = DarkSlateBlue]
rubin -> combined [arrowhead = vee, color = DarkSlateBlue]
}")*Rubin’s Rules: Average the estimates across m estimates. Calculate the standard errors and variance of m estimates. Combine using an adjustment term (1+1/m).
There are other methods of imputation worth noting and are briefly descrbied below.
Regression Imputation is based on a linear regression model. Missing data is randomly drawn from a conditional distribution when variables are continuous and from a logistic regression model when they are categorical (van_Ginkel et al. 2020).
Predictive Mean Matching is also based on a linear regression model. The approach is the same as regression imputation when it comes to categorical missing values but different for continuous variables. Instead of random draws from a conditional distribution, missing values are based on predicted values of the outcome variable (van_Ginkel et al. 2020).
Hot Deck (HD) imputation is when a missing value is replaced by an observed response of a similar unit, also known as the donor. It can be either random or deterministic (based on a metric or value) (Thongsri and Samart 2022). It does not rely on model fitting.
Stochastic Regression (SR) Imputation is an extension of regression imputation. The process is the same but a residual term from the normal distribution of the regression of the predictor outcome is added to the imputed value (Thongsri and Samart 2022). This maintains the variability of the data.
Random Forest (RF) Imputation is based on machine learning algorithms. Missing values are first replaced with the mean or mode of that particular variable and then the dataset is split into a training set and a prediction set (Thongsri and Samart 2022). The missing values are then replaced with predictions from these sets. This type of imputation can be used on continuous or categorical variables with complex interactions.
Multiple Imputation by Chained Equations (MICE)
In multiple imputation, m imputed values are created for each of the missing data and result in M complete datasets. For each of the M datasets, an estimate of \(\theta\) is acquired.
Combined estimator of \(\theta\) is given by:
\({\hat{\theta}}_{M}\)=\(\displaystyle \frac{1}{M}\)\(\sum_{m = 1}^{M} {\hat{\theta}}_{m}\)
The proposed variance estimator of \({\hat{\theta}}_{M}\) is given by:
\({\hat{\Phi}}_{M}\) = \({\overline{\phi}}_{M}\)+(1+\(\displaystyle \frac{1}{M}\))B\(_{M}\)
where \({\overline{\phi}}_{M}\) = \(\displaystyle \frac{1}{M}\)\(\sum_{m = 1}^{M}\)\({\hat{\phi}}_m\)
and B\(_{M}\) = \(\displaystyle \frac{1}{M-1}\)\(\sum_{m = 1}^{M}\)(\({\hat{\theta}}_{m}\)-\({\overline{\theta}}_{M}\))\(^{2}\)
The chained equation process has the following steps (Azur et al. 2011):
Using simple imputation, replace the missing data with this value, referred to as the “place holder”.
The “place holder” values for one variable are set back to missing.
The observed values from this variable (dependent variable) are regressed on the other variables (independent variables) in the model, using the same assumptions when performing linear, logistic, or Poison regression.
The missing values are replaced with predictions “m” from this newly created model.
Repeat Steps 2-4 for each variable that have missing values until all missing values have been replaced.
Repeat Steps 2-4, updating imputations each cycle for as many “m” cycles/imputations that are required.
# load data
credit = read.csv("credit_data.csv")
# load libraries
library(gtsummary)
library(dplyr, warn.conflicts=FALSE)
library(mice, warn.conflicts=FALSE)Credit score data
The credit.csv file is from the website of Dr. Lluís A. Belanche Muñoz, by way of a github repository of Dr. Gaston Sanchez. It contains data of 4,454 subjects and stores a combination of continuous, categorical and count values for 15 variables. Of the 15 variables, the “Status” variable contains binomial categorical values of “good” and “bad” to describe the kind of credit score each subject has. One data point is missing an outcome and was removed from the original data.
| Variable | Type | Description |
|---|---|---|
| X | Integer | Count variable indicating the number of subjects. |
| Status | Character | 2-level categorical variable indicating the status of the subject’s credit: good or bad. |
| Seniority | Integer | Count variable indicating the seniority a subject has accumulated over the course of their life. |
| Home | Character | 6-level categorical variable indicating the subject’s relationship to their residential address: rent, owner, parents, priv, other, or ignore. |
| Time | Integer | Count variable showing how many months has elapsed since the subject’s payment deadline without paying their debt full. |
| Age | Integer | Count variable indicating subject’s age (in years). |
| Marital | Character | 5-level categorical variable indicating the subject’s marital status: single, married, separated, divorced, or widow. |
| Records | Character | 2-level categorical variable indicating whether the subject has a credit history record: yes or no. |
| Job | Character | 4-level categorical variable indicating the type of job the subject has: fixed, freelance, partime, or others. |
| Expenses | Integer | Count variable indicating the amount of expenses (in USD) a subject has. |
| Income | Integer | Count variable indicating the amount of income (in thousands of USD) a subject earns annually. |
| Assets | Integer | Count variable indicating the amount of assets (in USD) a subject has. |
| Debt | Integer | Count variable indicating the amount of debt (in USD) a subject has. |
| Amount | Integer | Count variable indicating the amount of money (in USD) remaining in a subject’s bank account. |
| Price | Integer | Count variable indicating the amount of money a subject earns by the end of the month. |
credit %>%
tbl_summary(by = Status,
missing_text = "NA") %>%
add_p() %>%
add_n() %>%
add_overall %>%
modify_header(label ~ "**Variable**") %>%
modify_caption("**Summary of Credit Data**") %>%
bold_labels()| Variable | N | Overall, N = 4,4541 | bad, N = 1,2541 | good, N = 3,2001 | p-value2 |
|---|---|---|---|---|---|
| X | 4,454 | 2,228 (1,114, 3,341) | 2,222 (1,142, 3,366) | 2,232 (1,098, 3,326) | 0.3 |
| Seniority | 4,454 | 5 (2, 12) | 2 (1, 6) | 7 (2, 14) | <0.001 |
| Home | 4,448 | <0.001 | |||
| ignore | 20 (0.4%) | 9 (0.7%) | 11 (0.3%) | ||
| other | 319 (7.2%) | 146 (12%) | 173 (5.4%) | ||
| owner | 2,107 (47%) | 390 (31%) | 1,717 (54%) | ||
| parents | 783 (18%) | 233 (19%) | 550 (17%) | ||
| priv | 246 (5.5%) | 84 (6.7%) | 162 (5.1%) | ||
| rent | 973 (22%) | 388 (31%) | 585 (18%) | ||
| NA | 6 | 4 | 2 | ||
| Time | 4,454 | 48 (36, 60) | 48 (36, 60) | 48 (36, 60) | <0.001 |
| Age | 4,454 | 36 (28, 45) | 34 (27, 42) | 36 (28, 46) | <0.001 |
| Marital | 4,453 | <0.001 | |||
| divorced | 38 (0.9%) | 14 (1.1%) | 24 (0.8%) | ||
| married | 3,241 (73%) | 829 (66%) | 2,412 (75%) | ||
| separated | 130 (2.9%) | 64 (5.1%) | 66 (2.1%) | ||
| single | 977 (22%) | 328 (26%) | 649 (20%) | ||
| widow | 67 (1.5%) | 19 (1.5%) | 48 (1.5%) | ||
| NA | 1 | 0 | 1 | ||
| Records | 4,454 | 773 (17%) | 429 (34%) | 344 (11%) | <0.001 |
| Job | 4,452 | <0.001 | |||
| fixed | 2,805 (63%) | 580 (46%) | 2,225 (70%) | ||
| freelance | 1,024 (23%) | 333 (27%) | 691 (22%) | ||
| others | 171 (3.8%) | 68 (5.4%) | 103 (3.2%) | ||
| partime | 452 (10%) | 271 (22%) | 181 (5.7%) | ||
| NA | 2 | 2 | 0 | ||
| Expenses | 4,454 | 51 (35, 72) | 49 (35, 75) | 52 (35, 68) | 0.8 |
| Income | 4,073 | 125 (90, 170) | 100 (74, 148) | 130 (100, 178) | <0.001 |
| NA | 381 | 217 | 164 | ||
| Assets | 4,407 | 3,000 (0, 6,000) | 0 (0, 4,000) | 4,000 (0, 7,000) | <0.001 |
| NA | 47 | 20 | 27 | ||
| Debt | 4,436 | 0 (0, 0) | 0 (0, 0) | 0 (0, 0) | 0.3 |
| NA | 18 | 13 | 5 | ||
| Amount | 4,454 | 1,000 (700, 1,300) | 1,100 (800, 1,415) | 1,000 (700, 1,250) | <0.001 |
| Price | 4,454 | 1,400 (1,117, 1,692) | 1,423 (1,062, 1,728) | 1,400 (1,134, 1,678) | >0.9 |
| 1 Median (IQR); n (%) | |||||
| 2 Wilcoxon rank sum test; Pearson's Chi-squared test | |||||
First, we evaluate the dataset for missing values. As indicated in the table, the data does contain NA/missing values. We can create a table that shows each variable and how many missing values they have:
# Shows which variables have missing values and how many
colSums(is.na(credit)) X Status Seniority Home Time Age Marital Records
0 0 0 6 0 0 1 0
Job Expenses Income Assets Debt Amount Price
2 0 381 47 18 0 0
We now must analyze the data to see how we intend to handle the missing values. In order to do this, we need to create a new dataset, called new_credit, that deletes the missing data. We want to perserve the original dataset so we can implement the method we intend to use to address the missing values. We can then generate a count of rows to determine how many values were deleted in total.
# Creates a new dataset excluding missing values
new_credit = na.omit(credit)
# Number of rows of new dataset
nrow(new_credit)[1] 4039
We started out with 4,454 rows and our new dataset has 4,039. 415 rows were deleted due to the missing data. To run regression, we would be throwing away 9.3% of our data, because of missingness. Instead, we can use multiple imputation to impute the missing values so that we don’t have to discard such valuable information.
Using the MICE (Multivariate Imputation by Chained Equations) package in R, a statistical programming software, we will create multiple datasets with imputed values for the missing values. Because our dataset contains just under 10% of missing data, we will generate 10 imputations, or 10 new datasets. The MICE package seamlessly does this by creating plausable values from other columns and places them into the intersections of rows and columns with missing data.
First step is to check the missingness by looking for patterns in the original dataset using the md.pattern() function:
credit <- credit[-c(1)]
md.pattern(credit, rotate.names = TRUE) Status Seniority Time Age Records Expenses Amount Price Marital Job Home
4039 1 1 1 1 1 1 1 1 1 1 1
366 1 1 1 1 1 1 1 1 1 1 1
22 1 1 1 1 1 1 1 1 1 1 1
7 1 1 1 1 1 1 1 1 1 1 1
8 1 1 1 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1 1 1 0
2 1 1 1 1 1 1 1 1 1 1 0
1 1 1 1 1 1 1 1 1 1 0 1
1 1 1 1 1 1 1 1 1 1 0 0
1 1 1 1 1 1 1 1 1 0 1 1
0 0 0 0 0 0 0 0 1 2 6
Debt Assets Income
4039 1 1 1 0
366 1 1 0 1
22 1 0 1 1
7 1 0 0 2
8 0 0 1 2
4 0 0 0 3
3 0 0 1 3
2 0 0 0 4
1 1 1 0 2
1 0 0 0 5
1 1 1 1 1
18 47 381 455
Blue is observed values and red is missing values. There are 11 patterns.
In order to perform multiple imputation on categorical data, all string variables must be converted to factors using the as.factor() function:
credit$Status = as.factor(credit$Status)
credit$Home = as.factor(credit$Home)
credit$Marital = as.factor(credit$Marital)
credit$Records = as.factor(credit$Records)
credit$Job = as.factor(credit$Job)Using the mice() function, 10 multiple imputations for the missing values will be generated. The default is 5, so you must set m = to the number of imputations that you desire. Since the data type of the variables in the dataset are of both numerical and categorical nature (with 2 and more levels), the defaultMethod argument will contain pmm: predictive mean matching (numeric data); logreg: logistic regression imputation (binary data, factor with 2 levels); polyreg: polytomous regression imputation for unordered categorical data (factor > 2 levels); polr: proportional odds model for (ordered, > 2 levels). The set.seed will be given the value 1337 (any number can be used here) to retrieve the same results each time the multiple imputation is performed.
Multiple_Imputation = mice(data = credit, maxit = 10, m = 10, defaultMethod = c("pmm", "logreg", "polyreg", "polr"), set.seed = 1337)
iter imp variable
1 1 Home Marital Job Income Assets Debt
1 2 Home Marital Job Income Assets Debt
1 3 Home Marital Job Income Assets Debt
1 4 Home Marital Job Income Assets Debt
1 5 Home Marital Job Income Assets Debt
1 6 Home Marital Job Income Assets Debt
1 7 Home Marital Job Income Assets Debt
1 8 Home Marital Job Income Assets Debt
1 9 Home Marital Job Income Assets Debt
1 10 Home Marital Job Income Assets Debt
2 1 Home Marital Job Income Assets Debt
2 2 Home Marital Job Income Assets Debt
2 3 Home Marital Job Income Assets Debt
2 4 Home Marital Job Income Assets Debt
2 5 Home Marital Job Income Assets Debt
2 6 Home Marital Job Income Assets Debt
2 7 Home Marital Job Income Assets Debt
2 8 Home Marital Job Income Assets Debt
2 9 Home Marital Job Income Assets Debt
2 10 Home Marital Job Income Assets Debt
3 1 Home Marital Job Income Assets Debt
3 2 Home Marital Job Income Assets Debt
3 3 Home Marital Job Income Assets Debt
3 4 Home Marital Job Income Assets Debt
3 5 Home Marital Job Income Assets Debt
3 6 Home Marital Job Income Assets Debt
3 7 Home Marital Job Income Assets Debt
3 8 Home Marital Job Income Assets Debt
3 9 Home Marital Job Income Assets Debt
3 10 Home Marital Job Income Assets Debt
4 1 Home Marital Job Income Assets Debt
4 2 Home Marital Job Income Assets Debt
4 3 Home Marital Job Income Assets Debt
4 4 Home Marital Job Income Assets Debt
4 5 Home Marital Job Income Assets Debt
4 6 Home Marital Job Income Assets Debt
4 7 Home Marital Job Income Assets Debt
4 8 Home Marital Job Income Assets Debt
4 9 Home Marital Job Income Assets Debt
4 10 Home Marital Job Income Assets Debt
5 1 Home Marital Job Income Assets Debt
5 2 Home Marital Job Income Assets Debt
5 3 Home Marital Job Income Assets Debt
5 4 Home Marital Job Income Assets Debt
5 5 Home Marital Job Income Assets Debt
5 6 Home Marital Job Income Assets Debt
5 7 Home Marital Job Income Assets Debt
5 8 Home Marital Job Income Assets Debt
5 9 Home Marital Job Income Assets Debt
5 10 Home Marital Job Income Assets Debt
6 1 Home Marital Job Income Assets Debt
6 2 Home Marital Job Income Assets Debt
6 3 Home Marital Job Income Assets Debt
6 4 Home Marital Job Income Assets Debt
6 5 Home Marital Job Income Assets Debt
6 6 Home Marital Job Income Assets Debt
6 7 Home Marital Job Income Assets Debt
6 8 Home Marital Job Income Assets Debt
6 9 Home Marital Job Income Assets Debt
6 10 Home Marital Job Income Assets Debt
7 1 Home Marital Job Income Assets Debt
7 2 Home Marital Job Income Assets Debt
7 3 Home Marital Job Income Assets Debt
7 4 Home Marital Job Income Assets Debt
7 5 Home Marital Job Income Assets Debt
7 6 Home Marital Job Income Assets Debt
7 7 Home Marital Job Income Assets Debt
7 8 Home Marital Job Income Assets Debt
7 9 Home Marital Job Income Assets Debt
7 10 Home Marital Job Income Assets Debt
8 1 Home Marital Job Income Assets Debt
8 2 Home Marital Job Income Assets Debt
8 3 Home Marital Job Income Assets Debt
8 4 Home Marital Job Income Assets Debt
8 5 Home Marital Job Income Assets Debt
8 6 Home Marital Job Income Assets Debt
8 7 Home Marital Job Income Assets Debt
8 8 Home Marital Job Income Assets Debt
8 9 Home Marital Job Income Assets Debt
8 10 Home Marital Job Income Assets Debt
9 1 Home Marital Job Income Assets Debt
9 2 Home Marital Job Income Assets Debt
9 3 Home Marital Job Income Assets Debt
9 4 Home Marital Job Income Assets Debt
9 5 Home Marital Job Income Assets Debt
9 6 Home Marital Job Income Assets Debt
9 7 Home Marital Job Income Assets Debt
9 8 Home Marital Job Income Assets Debt
9 9 Home Marital Job Income Assets Debt
9 10 Home Marital Job Income Assets Debt
10 1 Home Marital Job Income Assets Debt
10 2 Home Marital Job Income Assets Debt
10 3 Home Marital Job Income Assets Debt
10 4 Home Marital Job Income Assets Debt
10 5 Home Marital Job Income Assets Debt
10 6 Home Marital Job Income Assets Debt
10 7 Home Marital Job Income Assets Debt
10 8 Home Marital Job Income Assets Debt
10 9 Home Marital Job Income Assets Debt
10 10 Home Marital Job Income Assets Debt
The following R code will show the imputed values. Columns are imputations, rows are observations.
Multiple_Imputation$imp$Status
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Seniority
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Home
1 2 3 4 5 6 7 8 9
30 parents owner priv priv priv rent owner other rent
240 owner owner owner owner owner other priv owner parents
1060 parents other owner parents parents parents parents parents parents
1677 other owner owner parents owner owner owner owner owner
2389 rent owner rent owner parents other rent rent other
2996 owner priv rent owner owner owner owner owner owner
10
30 rent
240 owner
1060 parents
1677 priv
2389 owner
2996 priv
$Time
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Age
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Marital
1 2 3 4 5 6 7 8 9
3319 married divorced widow married married married married married widow
10
3319 married
$Records
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Job
1 2 3 4 5 6 7 8
30 others freelance fixed freelance fixed fixed partime fixed
912 partime partime fixed fixed partime partime partime freelance
9 10
30 freelance fixed
912 partime fixed
$Expenses
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Income
1 2 3 4 5 6 7 8 9 10
30 105 219 90 27 152 95 85 140 62 80
114 105 80 70 109 70 32 87 110 120 16
144 144 80 210 250 115 184 330 959 80 320
153 113 80 131 118 50 69 67 57 124 140
158 220 129 159 139 196 115 205 140 140 132
177 250 254 245 300 250 300 94 250 178 254
195 187 156 95 220 100 104 90 100 250 158
206 200 75 181 130 150 100 110 179 224 250
241 160 155 180 250 166 68 110 177 150 68
242 80 129 189 50 196 177 126 135 188 172
278 80 105 100 153 153 100 143 140 184 197
318 101 80 95 105 79 228 85 130 41 92
330 95 120 140 150 95 240 100 135 122 230
333 223 210 190 136 98 110 167 35 190 193
335 132 163 132 200 126 96 140 117 168 144
356 163 125 70 100 142 125 92 114 141 117
360 70 133 80 107 50 80 105 132 234 59
394 150 350 350 500 500 500 491 150 500 500
404 25 105 200 55 115 108 144 123 110 147
422 165 242 46 123 205 120 98 120 160 110
439 120 212 175 119 223 95 110 110 120 180
444 67 167 70 100 150 95 93 250 75 107
462 139 168 147 243 220 128 230 150 77 100
469 120 210 144 78 142 154 110 150 366 95
479 300 60 102 112 100 227 86 90 135 250
481 200 158 113 132 100 170 105 50 205 200
483 72 233 165 130 77 184 205 84 120 100
485 125 200 175 147 144 63 154 366 240 147
496 100 80 187 113 102 82 90 60 100 150
498 72 101 208 245 100 170 158 125 85 150
505 75 81 171 125 222 107 57 134 76 103
567 113 113 172 104 176 158 64 179 160 126
572 91 82 65 57 80 67 140 124 181 121
582 70 182 35 49 67 182 65 19 56 56
648 250 250 191 125 210 60 260 50 500 288
653 62 161 180 260 92 97 122 130 110 195
667 416 416 91 245 241 250 142 245 250 230
675 107 230 250 250 88 53 208 166 42 250
678 126 115 120 185 225 107 60 140 73 140
699 184 137 67 60 176 130 188 100 172 120
708 340 124 72 235 107 80 52 104 179 71
714 90 105 48 85 99 142 75 132 283 100
716 80 93 42 80 103 38 70 52 114 86
733 136 124 124 366 168 119 70 123 104 150
734 92 178 8 60 130 160 155 133 125 107
746 85 63 135 110 185 98 140 140 118 110
777 104 70 75 75 192 65 136 150 70 209
781 134 100 140 149 122 117 155 87 143 160
785 150 120 170 65 200 165 314 464 150 464
804 100 211 71 122 121 135 300 110 210 145
824 86 113 175 175 150 55 138 105 182 50
865 120 123 80 86 148 90 100 149 127 204
866 100 92 100 100 128 107 82 100 157 62
880 315 102 154 50 129 108 104 107 140 212
889 428 442 172 190 111 150 41 350 200 156
906 210 350 225 214 100 382 70 235 122 394
912 120 53 120 80 150 158 135 113 95 128
942 69 65 75 38 70 79 82 140 110 106
952 139 215 115 144 225 188 140 125 121 130
989 78 100 80 103 147 76 71 74 50 8
1001 40 140 77 53 70 63 92 66 63 59
1017 145 70 233 135 117 68 113 63 185 150
1039 213 191 233 200 120 72 107 150 167 234
1044 140 149 75 75 90 78 72 80 90 80
1069 155 134 138 250 176 92 190 178 110 333
1100 62 102 40 125 94 22 136 87 77 107
1111 120 115 115 121 94 127 65 173 82 48
1125 40 315 166 178 184 139 300 100 153 306
1168 470 145 157 92 230 205 100 175 202 187
1208 245 60 150 160 96 117 158 220 200 214
1226 161 160 250 99 250 161 120 176 61 250
1250 67 200 86 102 201 150 117 100 160 64
1257 58 246 390 150 333 158 152 130 154 49
1276 426 125 210 182 200 117 178 85 45 90
1281 105 120 107 160 180 216 80 80 56 75
1289 72 75 63 65 162 75 90 103 103 38
1297 190 165 70 136 130 115 91 109 85 112
1307 96 245 110 110 96 90 121 195 177 172
1314 218 84 100 159 200 290 218 137 127 324
1335 185 117 180 100 117 70 178 134 213 146
1364 154 240 183 158 289 60 228 214 79 857
1365 185 145 165 170 129 125 120 250 53 212
1366 75 250 154 90 130 210 132 268 148 148
1392 500 491 491 491 350 500 150 150 500 500
1421 167 100 170 125 167 204 186 145 70 173
1427 90 230 100 270 196 200 90 156 143 100
1433 217 132 222 63 99 8 100 105 68 66
1436 81 100 105 85 176 113 147 159 156 87
1437 85 60 240 83 163 125 80 125 140 215
1441 67 134 104 94 86 60 144 110 139 65
1456 120 41 42 105 65 107 57 91 60 58
1473 110 120 235 200 118 107 413 165 113 133
1509 80 156 90 118 140 141 46 17 86 135
1513 56 110 87 75 50 65 75 135 150 120
1530 117 66 140 120 80 149 166 114 80 80
1535 200 108 178 135 57 138 100 184 112 107
1536 152 257 300 250 373 400 231 275 532 230
1544 100 107 111 110 100 109 160 105 195 166
1549 125 150 120 76 171 122 96 110 120 189
1564 97 122 95 100 110 25 180 164 56 120
1580 135 130 114 54 87 110 178 170 86 81
1583 131 110 116 450 87 88 150 42 78 195
1598 260 146 240 107 49 35 216 90 190 195
1599 64 106 100 80 165 109 115 220 82 140
1619 152 125 112 100 80 72 145 139 232 159
1629 364 200 52 67 142 160 125 101 236 75
1648 138 135 130 116 100 26 63 75 70 48
1662 95 236 80 245 95 104 121 200 107 135
1677 90 120 71 115 120 120 123 208 340 93
1685 250 92 76 145 125 85 180 104 210 103
1722 180 100 150 81 200 55 207 154 81 225
1724 120 37 161 200 118 118 112 184 108 111
1733 220 214 230 275 250 170 318 310 130 213
1741 92 77 55 86 95 63 70 60 49 105
1745 120 150 149 260 270 131 230 203 172 146
1753 60 136 50 72 82 60 50 136 126 135
1762 150 139 108 107 70 260 76 74 111 105
1766 93 200 128 107 857 150 275 191 315 104
1771 170 106 214 60 93 230 186 219 20 223
1798 108 100 112 118 263 231 180 80 143 70
1802 491 150 150 500 150 150 500 500 491 905
1803 69 89 131 80 144 225 120 150 90 160
1807 137 115 86 53 100 190 120 68 85 70
1811 69 90 146 100 199 75 143 221 100 150
1844 85 199 256 169 100 113 178 245 130 46
1851 700 250 275 178 350 250 120 125 230 260
1852 100 107 117 100 67 135 120 50 85 90
1870 330 150 352 60 177 205 177 202 251 71
1872 75 101 90 90 185 118 117 131 85 89
1882 136 60 72 77 48 100 120 50 92 71
1883 150 54 300 106 140 115 117 250 340 66
1893 150 350 150 905 200 491 491 500 491 150
1898 80 86 80 117 195 60 86 60 225 150
1903 47 139 100 202 60 57 60 57 120 92
1907 250 535 275 93 274 100 118 214 314 120
1920 60 101 93 74 82 35 100 116 135 123
1936 160 201 100 125 165 207 200 341 200 125
1946 137 101 55 100 236 112 120 109 90 79
1948 53 101 65 130 75 85 64 85 85 71
1962 115 114 179 104 318 148 55 100 110 120
1963 341 220 260 182 318 130 219 138 250 188
1965 86 80 91 60 176 300 80 130 200 80
1970 150 100 500 230 150 283 250 250 100 189
1972 500 500 491 491 500 150 500 500 905 200
1977 242 276 293 198 180 190 157 100 137 247
1979 100 109 186 90 100 105 180 95 85 107
1980 100 80 182 50 119 70 128 72 70 58
1984 274 143 218 41 60 195 186 145 200 240
2006 100 195 143 42 145 103 68 52 154 241
2016 260 230 100 140 180 199 250 114 280 214
2022 86 142 93 85 62 184 139 150 150 110
2025 266 251 191 148 313 220 68 245 289 300
2042 102 178 60 50 200 160 127 142 103 128
2043 16 96 140 95 140 81 65 190 80 163
2076 90 122 101 73 155 130 64 140 95 80
2077 152 169 38 160 240 139 123 200 187 52
2083 92 71 121 166 202 162 189 42 240 85
2156 150 62 202 260 77 300 85 198 200 96
2157 218 125 70 75 87 90 19 35 151 130
2186 110 60 105 200 140 234 92 140 92 125
2197 100 140 175 169 165 82 150 238 150 112
2205 100 42 150 428 315 210 115 150 380 180
2218 100 250 187 65 100 120 59 192 100 134
2227 50 30 55 46 51 72 50 70 63 138
2233 55 91 143 120 60 120 70 92 149 250
2240 341 174 66 78 189 100 209 274 120 208
2257 106 76 184 118 120 220 88 162 88 115
2280 275 260 178 137 180 133 321 531 257 180
2291 293 146 101 300 224 140 190 158 113 384
2297 91 70 84 119 135 115 81 150 128 138
2304 53 52 85 42 56 70 32 135 31 33
2310 150 111 315 150 360 532 74 230 500 106
2323 70 26 76 135 45 130 56 58 65 121
2331 500 500 241 905 500 150 241 905 241 491
2337 51 80 27 42 152 80 73 114 113 75
2349 120 500 303 120 120 300 100 142 186 165
2365 700 155 100 158 200 160 200 390 170 212
2369 500 140 147 199 230 178 230 225 190 121
2387 135 219 66 75 116 150 120 90 139 72
2396 80 63 105 68 120 125 98 74 440 140
2399 150 200 117 121 500 81 223 138 466 155
2402 265 200 199 700 200 191 183 150 79 300
2404 139 157 110 150 300 63 106 165 150 200
2437 100 151 151 250 350 115 257 137 315 275
2445 170 121 25 128 181 115 200 75 199 93
2446 76 140 145 144 123 129 220 187 146 120
2453 115 109 178 75 110 170 115 113 78 81
2460 110 75 78 139 95 60 92 45 45 90
2467 40 63 37 109 130 59 50 120 72 73
2473 41 70 65 129 116 128 150 82 110 98
2490 85 96 82 42 85 182 52 88 46 152
2495 80 84 125 130 63 78 99 48 85 63
2505 318 107 93 268 260 200 183 340 273 130
2566 148 93 92 120 129 116 142 221 327 70
2572 150 128 140 148 137 112 125 80 160 143
2578 176 112 62 116 90 87 160 164 83 319
2584 69 75 110 48 42 130 100 79 150 100
2596 67 78 60 83 92 120 60 80 166 72
2605 100 90 100 126 123 82 73 55 75 85
2614 110 114 191 126 206 60 85 125 131 180
2624 75 100 210 104 180 118 41 61 39 120
2625 155 162 165 61 155 43 177 160 107 150
2631 91 175 110 98 76 24 125 212 115 208
2632 137 90 68 63 126 60 140 103 88 137
2651 53 104 83 102 130 100 147 140 185 150
2652 148 94 100 104 79 108 70 150 120 170
2653 250 90 60 88 105 92 130 81 90 125
2668 213 400 50 85 64 177 250 90 170 144
2676 115 107 89 72 160 132 61 121 170 260
2681 163 110 77 99 100 100 250 128 100 220
2683 80 106 103 142 160 85 120 90 60 71
2695 141 90 80 169 160 90 107 158 152 47
2696 92 67 127 83 73 156 47 105 92 87
2707 200 119 142 90 81 157 104 140 90 137
2720 210 211 210 164 250 100 53 110 478 70
2723 100 147 57 45 87 83 60 56 123 63
2725 148 50 200 857 165 65 175 208 183 315
2730 155 92 178 67 185 159 180 90 120 133
2769 70 90 70 58 85 105 251 129 89 45
2780 40 110 45 32 85 109 85 66 119 109
2781 106 183 186 173 207 133 60 150 390 230
2802 90 161 130 90 81 154 168 167 226 150
2805 100 130 162 81 182 156 300 100 77 142
2806 166 150 92 148 99 50 131 99 131 57
2807 60 50 90 120 110 105 270 105 93 182
2810 113 85 136 110 150 200 65 92 97 110
2813 300 160 143 66 167 40 123 265 123 215
2815 60 80 166 225 82 283 125 98 60 93
2825 130 148 223 140 157 260 150 310 155 200
2854 171 140 42 466 65 228 150 300 266 212
2869 42 213 90 122 138 69 177 99 121 145
2882 100 70 95 65 100 60 95 83 335 78
2884 130 93 125 80 80 190 57 136 283 113
2893 160 100 135 67 65 142 190 163 211 181
2915 81 187 169 140 64 300 90 210 153 125
2927 21 62 72 120 59 80 112 125 149 27
2935 88 86 50 134 120 195 90 102 73 90
2936 200 174 158 86 200 128 205 424 170 150
2939 115 130 120 173 122 137 85 100 53 115
2951 500 183 200 245 500 178 416 416 91 178
2954 300 250 200 700 120 125 260 144 230 959
2969 171 150 95 105 189 117 211 116 315 197
2971 300 124 211 230 100 123 164 185 173 173
2979 94 94 240 171 300 531 300 142 120 300
2983 57 197 110 99 116 85 76 250 128 80
2991 101 95 60 70 66 51 80 72 76 65
2996 120 157 66 82 67 104 118 268 160 69
2999 114 305 300 700 174 50 220 129 177 236
3008 300 60 288 125 125 60 250 137 275 459
3014 120 135 160 115 35 200 175 88 160 210
3021 233 42 130 192 110 100 102 198 210 270
3026 90 140 32 75 200 164 100 100 81 110
3031 90 74 73 175 33 182 81 60 60 130
3038 67 53 42 53 67 121 121 52 33 121
3040 340 211 42 298 210 320 345 128 158 92
3069 436 165 142 95 200 260 198 63 268 109
3080 120 106 117 92 121 150 92 140 101 83
3096 253 70 126 170 84 110 150 178 81 160
3104 76 64 8 150 75 168 165 125 130 129
3106 160 107 204 216 120 131 181 178 141 110
3110 74 190 191 150 266 111 371 41 166 183
3121 157 90 200 359 464 383 222 80 292 67
3123 72 130 130 188 210 231 167 187 243 171
3139 400 250 150 230 90 200 373 459 321 120
3167 55 85 162 110 57 210 128 120 52 80
3170 218 176 140 135 128 155 67 231 133 124
3183 67 180 80 120 100 145 43 179 114 148
3185 419 217 110 95 126 75 100 124 112 75
3187 156 125 78 63 100 97 85 159 113 245
3203 120 59 88 164 67 87 59 117 100 63
3218 78 114 70 52 75 70 140 80 55 105
3222 99 201 230 187 120 180 120 108 90 66
3229 150 112 77 135 139 62 100 73 132 45
3233 464 53 53 212 247 300 103 230 191 176
3237 123 210 145 135 103 158 215 142 256 95
3245 139 150 97 100 130 55 75 140 120 95
3252 125 107 140 47 95 280 80 88 172 116
3266 187 197 206 160 352 244 105 95 225 215
3286 136 100 70 65 129 89 160 60 225 96
3288 75 132 150 290 208 250 82 275 42 200
3304 200 491 178 491 150 183 183 200 178 491
3310 142 210 156 75 134 180 125 55 250 80
3316 177 197 237 102 159 121 101 124 196 124
3325 16 96 170 140 151 120 176 107 73 124
3336 190 39 100 179 114 120 135 112 110 123
3338 245 200 416 183 245 245 183 241 230 200
3345 83 80 39 211 195 78 225 35 214 400
3352 168 318 156 235 107 80 63 242 155 173
3365 140 297 219 70 156 131 105 149 165 38
3382 176 130 118 101 110 96 193 62 63 137
3433 80 66 92 75 140 75 93 77 130 102
3439 209 110 82 65 116 65 75 52 106 209
3451 85 139 105 92 42 139 70 136 77 50
3452 65 62 92 66 123 82 43 75 80 85
3454 109 233 173 120 63 131 107 300 145 90
3456 75 95 140 90 158 250 67 96 60 130
3461 133 90 154 240 140 120 132 121 101 300
3462 114 140 120 257 115 72 110 93 86 100
3473 78 160 120 65 185 110 80 87 95 160
3477 78 125 145 115 89 100 214 90 121 140
3478 300 90 152 70 165 93 123 150 145 140
3494 80 122 47 92 62 125 227 220 140 90
3513 129 113 180 160 100 88 225 172 100 145
3523 150 90 160 130 158 97 315 40 60 95
3525 78 109 83 116 95 169 100 168 160 150
3534 47 152 190 134 100 296 150 126 210 110
3556 203 61 186 205 246 218 200 125 214 146
3641 150 35 165 189 120 110 320 107 320 150
3645 122 232 110 125 113 105 95 72 90 115
3657 134 120 75 100 166 126 95 210 90 33
3674 42 175 120 90 120 48 151 173 53 63
3679 300 190 113 183 106 110 114 100 114 150
3691 178 122 203 170 70 139 135 144 120 175
3704 247 105 81 190 77 100 172 150 140 110
3709 300 150 235 72 240 143 130 475 50 116
3714 75 110 115 209 80 136 111 60 60 140
3717 190 110 114 140 130 67 105 120 63 90
3730 150 188 76 174 285 140 100 165 174 91
3740 141 350 80 90 156 135 160 140 135 106
3763 95 48 180 116 25 106 78 210 85 105
3768 115 131 201 96 145 100 247 192 324 81
3773 175 100 459 400 715 300 400 459 156 230
3794 142 130 72 179 82 106 112 102 102 217
3800 60 56 105 55 80 166 71 99 70 82
3823 16 83 176 156 140 78 66 141 133 50
3825 42 33 121 121 53 121 121 49 49 33
3850 130 120 80 65 182 88 100 46 148 200
3855 53 60 125 250 250 173 90 124 126 135
3857 70 85 102 70 59 82 139 88 100 108
3858 105 70 110 130 80 75 139 63 110 77
3882 65 124 108 170 164 100 36 86 70 260
3887 145 157 200 168 139 240 230 130 250 134
3892 20 111 89 183 154 100 110 174 146 211
3902 116 234 80 105 205 175 132 164 240 138
3914 83 190 115 73 173 129 58 166 134 150
3928 160 500 430 60 137 350 300 144 325 178
3932 21 158 117 120 283 140 78 100 125 141
3945 152 270 128 145 66 81 77 140 350 65
3946 175 144 428 70 214 50 642 65 70 79
3947 23 60 56 120 35 85 67 55 50 88
3951 120 17 177 193 70 280 145 95 107 135
3955 100 150 100 66 52 60 70 40 62 60
3966 105 166 80 163 81 98 106 131 140 42
3992 185 58 71 100 40 85 100 120 90 200
4003 208 80 139 104 134 136 225 130 164 115
4023 340 223 105 105 57 207 220 250 186 270
4036 60 219 106 90 65 92 85 62 175 139
4049 67 49 121 121 53 42 67 88 49 49
4064 50 140 78 104 210 123 98 125 60 148
4069 340 276 107 135 470 147 20 190 117 220
4076 31 56 80 88 49 63 165 19 37 121
4082 200 211 108 74 60 470 80 100 225 110
4085 400 300 382 384 350 170 90 229 165 101
4096 54 136 200 193 135 110 244 25 85 107
4119 125 120 82 102 135 190 74 85 126 135
4159 102 72 110 34 141 17 146 40 126 127
4168 52 117 66 136 100 103 70 72 132 100
4173 47 105 92 71 75 135 90 70 130 139
4181 75 70 140 53 125 98 65 95 92 149
4191 310 196 131 120 142 85 141 375 130 120
4198 189 138 111 959 151 204 176 330 442 250
4199 200 110 154 70 130 179 192 82 100 90
4222 79 120 79 72 180 200 60 78 150 92
4223 66 110 116 70 166 88 92 51 200 131
4237 185 117 114 170 113 92 100 140 132 125
4246 182 182 169 165 125 97 100 92 265 142
4247 65 100 25 60 150 70 129 60 105 150
4256 250 150 298 191 128 148 157 250 150 122
4281 99 125 74 38 139 83 150 60 105 19
4295 100 87 124 96 60 80 96 96 65 51
4333 90 130 60 52 75 47 110 55 82 147
4349 160 104 155 130 70 125 95 80 155 210
4368 125 119 92 128 118 140 110 36 58 46
4373 175 121 123 102 122 100 87 247 90 170
4398 85 155 100 115 169 111 115 67 126 63
4411 250 160 140 356 133 212 700 93 180 113
4420 150 491 905 905 241 905 491 905 905 200
4433 120 60 154 131 65 175 80 210 185 180
4436 92 80 125 52 144 110 145 140 78 70
4440 101 178 134 190 98 217 120 138 154 172
4441 260 200 248 162 85 205 162 168 107 246
$Assets
1 2 3 4 5 6 7 8 9 10
30 15000 10000 1500 4000 0 0 18 0 2500 0
240 4500 32000 19400 6000 60000 4000 8000 8000 4500 39000
735 7000 7500 15000 5500 15000 10500 14000 18000 20000 80000
1060 3000 0 4000 4500 0 0 700 0 2000 3500
1129 11500 6000 5000 8500 5000 10000 12000 2500 8000 20000
1670 2500 0 3500 0 4000 0 3500 14000 4000 4000
1677 0 6000 10000 10000 8500 8500 40000 25000 13500 8500
1812 1000 25000 6500 10000 3000 30000 6000 60000 8000 6000
1845 0 0 0 0 0 0 0 0 0 1500
1878 2000 0 0 0 0 4000 1000 2500 0 0
1893 26000 150000 26000 80000 30000 20000 150000 50000 150000 200000
2074 3500 30000 0 0 18000 5000 5000 3500 3000 4700
2237 0 0 0 0 0 1800 3000 0 800 0
2291 7750 10000 7000 14000 2500 8000 7000 30000 10000 3000
2368 0 0 0 0 0 0 3000 3500 0 0
2389 0 0 0 0 0 12000 0 0 0 5500
2439 0 0 0 0 0 0 0 0 0 0
2449 13000 8000 8000 0 7000 10000 10000 10000 6000 5000
2473 0 5500 5000 3000 8000 6000 9000 4000 4000 6500
2530 5000 4162 3500 5000 4500 6000 3000 4000 12000 6000
2653 5000 7500 0 7100 10000 8000 10000 7000 2500 78000
2720 7000 7000 4000 0 25000 3500 25000 11000 6000 8000
2772 6000 0 0 3000 3000 2000 4000 2500 8000 6500
2857 5000 4000 0 0 3500 0 0 0 4000 0
2951 30000 0 30000 28000 8000 25000 90000 65000 50000 29500
2996 0 11000 0 0 16000 21500 5500 8000 15000 15000
3053 15000 9000 3500 7500 4500 10000 5000 4000 7000 11000
3183 5000 8000 4300 14000 5000 9000 0 4000 7500 5000
3196 5000 5000 7000 15000 5000 11000 5500 25000 11500 35000
3218 2000 8000 3000 2500 4000 2500 7500 0 4500 6000
3229 9000 9000 3500 10000 7000 6000 2000 0 10000 3000
3330 0 0 0 0 4000 0 0 0 0 0
3440 3500 3500 11500 2000 5000 7000 60000 4000 5000 6000
3549 0 0 4000 0 3000 0 3500 7000 4000 0
3647 15000 9000 6000 8000 3000 6000 7000 5000 12000 5000
3652 0 0 0 0 0 0 0 0 0 0
3661 3000 5000 15000 0 0 5000 4000 0 4000 6000
3821 0 45000 0 6000 2000 4000 4000 3000 13500 7500
4035 25000 5000 5000 0 5000 3000 1600 0 0 3500
4074 6000 30000 3000 6500 8000 2000 4650 20000 10000 15000
4111 8000 7000 4000 15000 3500 3500 2200 6000 8000 3000
4119 6000 8000 8000 11000 5000 30000 3000 9000 2500 10000
4168 24000 5000 3000 8000 9000 6000 7000 4000 20000 15000
4187 14000 0 0 0 0 0 0 0 0 0
4192 3500 3500 4500 4000 0 2800 4000 15000 4000 3500
4288 0 0 0 4500 0 3500 4000 0 0 0
4446 3000 4000 0 10000 6000 6000 6500 17000 6000 4000
$Debt
1 2 3 4 5 6 7 8 9 10
30 0 2000 500 0 0 0 0 0 0 0
240 1300 1000 0 480 0 0 0 0 0 3000
1060 0 0 0 0 0 0 0 0 0 0
1677 960 1000 0 0 0 0 0 0 0 0
1812 0 0 0 0 0 1408 0 0 0 0
1845 0 0 0 0 0 0 0 0 0 0
1878 0 0 500 0 0 0 600 0 360 1749
1893 0 500 0 15000 0 0 0 0 0 15000
2074 0 0 500 0 3130 0 3000 0 0 3378
2237 0 0 0 0 0 0 0 0 0 0
2389 0 0 0 500 0 0 108 0 0 0
2449 0 500 0 0 0 0 0 0 0 0
2653 2000 0 0 0 1500 0 0 0 0 0
2951 9300 0 0 1300 0 0 0 0 933 3000
2996 0 360 0 0 500 0 0 0 3700 0
3218 0 2000 0 0 1500 2800 0 0 0 3000
4074 600 0 0 0 0 0 0 0 0 0
4288 0 0 0 0 0 0 0 0 0 0
$Amount
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
$Price
[1] 1 2 3 4 5 6 7 8 9 10
<0 rows> (or 0-length row.names)
We can check the quality of the imputations by running a strip plot, which is a single axis scatter plot. It will show the distribution of each variable per imputed data set. We want the imputations to be values that could have been observed had the data not been missing.
par(mfrow=c(7,2))
stripplot(Multiple_Imputation, Status, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Seniority, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Home, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Time, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Age, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Marital, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Records, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Job, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Expenses, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Income, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Assets, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Debt, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Amount, pch = 19, xlab = "Imputation number")stripplot(Multiple_Imputation, Price, pch = 19, xlab = "Imputation number")Next, we will pool the results of the complete dataset with the imputed dataset to arrive at estimates that will properly account for the missing data. We fit the complete model with the with() function and display the summary of the pooled results. It will give us the estimate, standard error, test statistic, degrees of freedom, and the p-value for each variable.
# fit complete-data model
fit <- with(Multiple_Imputation, glm(Status ~ Seniority + Home + Time + Age + Marital + Records + Job + Expenses + Income + Assets + Debt + Amount + Price, family = binomial))
# pool and summarize the results
summary(pool(fit)) term estimate std.error statistic df
1 (Intercept) 9.672533e-01 7.341406e-01 1.31753137 4218.92797
2 Seniority 8.302704e-02 7.509116e-03 11.05683361 4026.09993
3 Homeother 6.847181e-02 5.740463e-01 0.11927924 4408.20935
4 Homeowner 1.151566e+00 5.598295e-01 2.05699445 4422.65325
5 Homeparents 9.460827e-01 5.683979e-01 1.66447262 4416.87452
6 Homepriv 4.143875e-01 5.773215e-01 0.71777591 4418.33662
7 Homerent 4.162157e-01 5.632987e-01 0.73888974 4409.85773
8 Time 6.772940e-05 3.486236e-03 0.01942766 4226.07583
9 Age -1.092253e-02 5.000167e-03 -2.18443281 4164.62555
10 Maritalmarried 6.137484e-01 4.234126e-01 1.44952805 2987.91483
11 Maritalseparated -6.762074e-01 4.664549e-01 -1.44967375 3525.81248
12 Maritalsingle 1.581997e-01 4.283363e-01 0.36933532 3172.24218
13 Maritalwidow 1.736481e-01 5.328093e-01 0.32591039 3649.32932
14 Recordsyes -1.785754e+00 1.021443e-01 -17.48265446 4339.76452
15 Jobfreelance -7.612312e-01 1.020269e-01 -7.46107977 4198.03391
16 Jobothers -7.117123e-01 2.047266e-01 -3.47640329 3215.47838
17 Jobpartime -1.472669e+00 1.260182e-01 -11.68615527 4373.09140
18 Expenses -1.521722e-02 2.637894e-03 -5.76870255 3189.84443
19 Income 7.267654e-03 8.237279e-04 8.82288088 86.99121
20 Assets 2.169205e-05 6.920445e-06 3.13448801 158.81379
21 Debt -1.715864e-04 3.765713e-05 -4.55654458 232.74078
22 Amount -1.949321e-03 1.722167e-04 -11.31900125 3937.75396
23 Price 8.748681e-04 1.266726e-04 6.90652888 3894.76779
p.value
1 1.877321e-01
2 5.120915e-28
3 9.050596e-01
4 3.974528e-02
5 9.608895e-02
6 4.729334e-01
7 4.600133e-01
8 9.845009e-01
9 2.898604e-02
10 1.472951e-01
11 1.472385e-01
12 7.119025e-01
13 7.445108e-01
14 3.435636e-66
15 1.037176e-13
16 5.149181e-04
17 4.329561e-31
18 8.752264e-09
19 1.036059e-13
20 2.051064e-03
21 8.396911e-06
22 2.977718e-29
23 5.775505e-12
In conclusion, missing data can occur in research for a variety of reasons. It is never a good idea to ignore it. Doing this will lead to biased estimates of parameters, loss of information, decreased statistical power, and weak reliability of findings (Dong and Peng 2013). The best course of action is to impute the missing data by using multiple imputation. When missing data is discovered, it is important to first identify it and look for missing data patterns. Next, define the variables in the dataset that are related to the missing values that will be used for imputation. Create the necessary number of complete data sets. Run the models and combine them using the imputed values, and finally, analyze the complete dataset. Performing these steps will minimize the adverse effects caused by missing data on the anaylsis (Pampka, Hutcheson, and Williams 2016).